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Draft Amendments: 
Claim 1 ; 

A method of clustering documents each having one or plural document segments in an 
input document set, said method comprising the following steps: 

(a) obtaining a co-occurrence matrix for each input document which is a matrix 
reflecting the occurrence fre quencies of terms a nd the co-occurrence freq uencies of term 
pairs , and obtaining an input document frequency matrix for the set of input documents 
based on occurrence frequencies of terms or term pairs appearing in the set of input 
documents wherein said step (a) further includes: 

fa-1) generating an input document segment vector for each of said input 
document segments based on occurrence frequencies of terms appearing in each input 
document segment; 

(a-2) obtaining the co-occurren ce matrix for each in put document from the input 
document se gment vectors; and 

(a-3) obtaining the in put documen t fre quency matrix from the co-occurrence 
matrix for each document; 

(b) selecting a seed document from a set of remaining documents that are not 
included in any cluster existing at that moment, and constructing a current cluster of an 
initial state based on the seed document, wherein said selecting and constructing 
comprises: 

(b-1) constructing a remaining document common co-occurrence matrix 
for the set of the remaining documents based on a product of corresponding 
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components of the co-occurrence matrices of all documents in the set of 
remaining documents; and 

(b-2) obtaining a document commonality of each remaining document to 
the set of the remaining documents based on a product sum between every 
component of the co-occurrence matrix of each remaining document and the 
corresponding component of the remaining document common co-occurrence 
matrix; 

(b-3) extracting, as the seed document, the document having the highest 
document commonality to the set of the remaining documents; and 

(b-4) constructing the initial cluster by including the seed document and 
neighbor documents similar to the seed document; 

(c) making documents, which have the document commonality to the current 
cluster higher than a threshold, belong temporarily to the current cluster; 
wherein said making comprising: 

(c-1) constructing a current cluster common co-occurrence matrix for the 
current cluster and a current cluster document frequency matrix of the current 
cluster based on occurrence frequencies of terms or term pairs appearing in the 
documents of the current cluster; 

(c-2) obtaining a distinctiveness value of each term and each term pair for 
the current cluster by comparing the input document frequency matrix with the 
current cluster document frequency matrix; 

(c-3) obtaining weights of each term and each term pair from their 
distinctiveness values; 
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(c-4) obtaining a document commonality to the current cluster for each 
document in the input document set based on a product sum between every 
component of the co-occurrence matrix of the input document and the 
corresponding component of the current cluster common co-occurrence matrix 
while applying the respective weights to said components; and 

(c-5) making documents having the document commonality to the current 
cluster higher than the threshold belong temporarily to the current cluster; 

(d) repeating step (c) until the number of documents temporarily belonging to the 
current cluster does not increase; 

(e) repeating steps (b) through (d) until a given convergence condition is 
satisfied; 

and 

(f) deciding, on the basis of the document commonality of each document to each 
cluster, a cluster to which each document belongs and outputting said cluster. 

Claim 2 (canceled) 

Claim 5 : 

The clustering method according to claim 1 , wherein the convergence condition in said 
step (e) is satisfied when 

(i) the number of documents whose document commonalities to any current 
clusters are less than a threshold becomes 0, or 
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(ii) the number is less than a threshold and does not increase. 
Claim 6 : 

The clustering method according to claim 1, wherein said step (f) further includes: 
cheeking existence of a redundant cluster, and removing, when the redundant cluster 
exists, the redundant cluster and again deciding the cluster to which each document 
belongs. 

Claim 7 : 

A method of clustering documents each having one or plural document segments in an 
input document set, said method comprising the followi ng steps: 
(a) obtaining a co-occurrence matrix S r for each input document D r based on occurrence 
frequencies of terms or term pairs appearing in the set of input documents; 

wb erem in step ( a), each mn component S v m of the co-occurrence matrix S r of the 
document D r is determined in accordance with: 



where: 

m an d n denot e m th and n' h terms, respective ly, among M terms appearing; in the 
set of input documents, 

Dr is the r lh document in a document set D consisting of R documents; 

Y r is the number of document segments in document D r , wherein d^m and dr y,, 
denote the existence or absence of the m th and n th terms, respectively, in the y th document 
segment of document D r , and 
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S r mm re presents the number of d ocument se g m ents in which the m ih term occurs 
and S r mn 

represents the co-o ccurrence counts of document segments in whi ch the m th and n th terms 
co-occur; 

(b) selecting a seed document from a set of remaining documents that are not included in 
any cluster existing at that moment, and constructing a current cluster of an initial state 
based on the seed document, wherein said selecting and constructing comprise: 

(b-1) constructing a remaining document common co-occurrence matrix T A for 
the set of the remaining documents based on the co-occurrence matrices of all documents 
in the set of remaining documents; 

(b-2) obtaining a document commonality of each remaining document to the set 
of the remaining documents based on the co-occurrence matrix S r of each remaining 
document and the remaining document common co-occurrence matrix T A ; 

(b-3) extracting, as the seed document, the document having the highest 
document commonality to the set of the remaining documents; and 

(b-4) constructing the initial cluster by including the seed document and 
neighbor documents similar to the seed document; 

(c) making documents having the document commonality higher than a threshold belong 
temporarily to the current cluster; 

(d) repeating step (c) until the number of documents temporarily belonging to the current 
cluster does not increase; 

(e) repeating steps (b) through (d) until a given convergence condition is satisfied; 
and 
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(f) deciding, on the basis of the document commonality of each document to each cluster, 
a cluster to which each document belongs and oufputting said cluster. 



Claim 9 : 

The method according to claim 7, wherein in step (b-1), the remaining document 
common co-occurrence matrix T A is determined on the basis of a matrix T; 
wherein the matrix T has an mn component determined by 



the matrix T A has an mn component determined by 
T A mn = T mn whenU mn >A, 
T A mn = 0 otherwise, 
where 

U mn represents an mn component of a document frequency matrix of the set of remaining 

documents wherein U mm denotes the number of remaining documents in which the m ,h 

term occurs and U mn denotes the number of remaining documents in which the m ,h and n !h 

terms co-occur; and 

A denotes a predetermined threshold. 

Claim 10: 

The method according to claim 9, further comprising: 

determining a modified common co-occurrence matrix Q A on the basis of T A ; and 
in step (b-2), obtaining the document commonality of each remaining document to the set 




S r mn >0 
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of the remaining documents based on the co-occurrence matrix Sr of each remaining 
document and the modified common co-occurrence matrix Q A ; 

the matrix Q A having an mn component determined by 

Q A mn = log T A ^ when T A ^ > 1 , 

Q A „„, = 0 otherwise. 

Claim 11 : 

The method according to claim 1 0, wherein in step (b-2), 

the document commonality of each remaining document P having a co-occurrence matrix 
S p with respect to the set of remaining documents is given by 



Claim 12 : 

The method according to claim 10, wherein in step (b-2), the document commonality of 
each remaining document P having a co-occurrence matrix S p with respect to the set of 
remaining documents is given by 
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Claims 23-24 (canceled) 
Claims 27-28 (canceled) 
Claims 29-31 (canceled) 



